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We consider a time series X — {Xt , k £ Z} with memory param- 
eter do £ R. This time series is either stationary or can be made sta- 
tionary after differencing a finite number of times. We study the "lo- 
cal Whittle wavelet estimator" of the memory parameter do- This is 
a wavelet-based semiparametric pseudo-likelihood maximum method 
estimator. The estimator may depend on a given finite range of scales 
or on a range which becomes infinite with the sample size. We show 
that the estimator is consistent and rate optimal if X is a linear 
process, and is asymptotically normal if X is Gaussian. 

1. Introduction. Let X = f {^fc}fcez be a process, not necessarily sta- 
tionary or invertible. Denote by AX, the first order difference, (AX)f = 
Xi — Xe_i, and by A fc X, the kth order difference. Following [9], the pro- 
cess X is said to have memory parameter do, do £ M, if for any integer 
k > do — 1/2, A fc X is covariance stationary with spectral measure 



(1) u u (d\) = \l-e- iX \^ k - do) u*( y dX), AG 



-7T,7T 



where v* is a nonnegative symmetric measure on [—it, tt] such that, in a 
neighborhood of the origin, it admits a positive and bounded density. The 
process X is covariance stationary if and only if do < 1/2. When do > 0, X 
is said to exhibit long memory or long-range dependence. The generalized 
spectral measure of X is defined as 

(2) u{d\) = \l-e- iX \- 2do u*(d\), AG[-7r,7r]. 



Received July 2007; revised July 2007. 
Supported in part by NSF Grant DMS-05-05747. 

AMS 2000 subject classifications. Primary 62Mf 5, 62MfO, 62G05; secondary 62G20, 
60Gf8. 

Key words and phrases. Long memory, semiparametric estimation, wavelet analysis. 

This is an electronic reprint of the original article published by the 
Institute of Mathematical Statistics in The Annals of Statistics, 
2008, Vol. 36, No. 4, 1925-1956. This reprint differs from the original in 
pagination and typographic detail. 



1 



2 



E. MOULINES, F. ROUEFF AND M. S. TAQQU 



We suppose that we observe X\, . . . ,X n and want to estimate the expo- 
nent do under the following semiparametric set-up introduced in [15]. Let 
(3 £ (0,2], 7 > and e 6 (0, n] , and assume that 

where Ti(f3,^,e) is the class of finite nonnegative symmetric measures on 
[— vr,7r] whose restrictions on [— e, e] admit a density g, such that, for all 
A 6 (-£,£), 

(3) \g(X)-g(0)\< 7 g(0)\Xf. 

Since e <ir, v* 6 7i(P,j,e) is only a local condition for A near 0. For 
instance, v* may contain atoms at frequencies in (e, ir] or have an unbounded 
density on this domain. 

We shall estimate do using the semiparametric local Whittle wavelet es- 
timator defined in Section 3. We will show that under suitable conditions, 
this estimator is consistent (Theorem 3), the convergence rate is optimal 
(Corollary 4) and it is asymptotically normal (Theorem 5). In Section 4, we 
discuss how it compares to other estimators. 

There are two popular semiparametric estimators for the memory param- 
eter do in the frequency domain: 

(1) the Geweke-Porter-Hudak (GPH) estimator introduced in [6] and ana- 
lyzed in [16], which involves a regression of the log-periodogram on the 
log of low frequencies; 

(2) the local Whittle (Fourier) estimator (or LWF) proposed in [11] and 
developed in [15], which is based on the Whittle approximation of the 
Gaussian likelihood, restricted to low frequencies. 

Corresponding approaches may be considered in the wavelet domain. By far, 
the most widely used wavelet estimator is based on the log-regression of the 
wavelet coefficient variance on the scale index, which was introduced in [1]; 
see also [14] and [13] for recent developments. A wavelet analog of the LWF, 
referred to as the local Whittle wavelet estimator can also be defined. This 
estimator was proposed for analyzing noisy data in a parametric context 
in [23] and was considered by several authors, essentially in a parametric 
context (see, e.g., [10] and [12]). To our knowledge, its theoretical properties 
are not known (see the concluding remarks in [22], page 107). The main 
goal of this paper is to fill this gap in a semiparametric context. The paper 
is structured as follows. In Section 2, the wavelet analysis of a time series 
is presented and some results on the dependence structure of the wavelet 
coefficients are given. The definition and the asymptotic properties of the 
local Whittle wavelet estimator are given in Section 3: the estimator is shown 
to be rate optimal under a general condition on the wavelet coefficients, 
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which are satisfied when X is a linear process with four finite moments, and 
it is shown to be asymptotically normal under the additional condition that 
X is Gaussian. These results are discussed in Section 4. The proofs can be 
found in the remaining sections. The linear case is considered in Section 5. 
The asymptotic behavior of the wavelet Whittle likelihood is studied in 
Section 6 and weak consistency is studied in Section 7. The proofs of the 
main results are gathered in Section 8. 

2. The wavelet analysis. The functions 4>(t), t G R, and ip(t), i€R, will 

denote the father and mother wavelets respectively, and (j)(^) = f J K 0(t)e -4 ^' dt 

and = f Jk tp(t)e~' l ^ t dt their Fourier transforms. We suppose that <fi and 
ip satisfy the following assumptions: 

(W-l) 4> and ip are integrable and have compact supports, <^>(0) = 
/r 4>(x) dx = 1 and / R ip 2 (x) dx = 1 ; 

(W-2) there exists a > 1 such that sup^ eE |^(£)|(1 + < oo; 

(W-3) the function ip has M vanishing moments, that is, J^t l tp(t) dt = 
for all / = 0,...,M-1; 

(W-4) the function Efcez fc V(- - k) is a polynomial of degree I for all 
Z = 0,...,M-1; 

(W-5) d , M, a and are such that (1 + 0)/2 - a < d < M. 

Assumption (W-l) implies that <j> and tp are everywhere infinitely dif- 
ferentiable. Assumption (W-2) is regarded as a regularity condition and as- 
sumptions (W-3) and (W-4) are often referred to as admissibility conditions. 
When (W-l) holds, assumptions (W-3) and (W-4) can be expressed in dif- 
ferent ways. (W-3) is equivalent to asserting that the first M — 1 derivative 
of ip vanish at the origin and hence 

(4) |^(A)| =0(|A| A/ ) asA^O. 
And, by [3], Theorem 2.8.1, page 90, (W-4) is equivalent to 

(5) sup|0(A + 2fc7r)| = 0(|A| M ) asA^O. 

Finally, (W-5) is the constraint on M and a that we will impose on the 
wavelet-based estimator of the memory parameter do of a process having 
generalized spectral measure (2) with u* G TC(f3,-f,e) for some positive /?, 7 
and e. Remarks 1 and 7 below provide some insights into (W-5). We may 
consider nonstationary processes X because the wavelet analysis performs 
an implicit differentiation of order M. It is perhaps less well known that, 
in addition, wavelets can be used with noninvertible processes (do < —1/2) 
due to the regularity condition (W-2). These two properties of the wavelet 
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are, to some extent, similar to the properties of the tapers used in Fourier 
analysis (see, e.g., [9, 22]). 

Adopting the engineering convention that large values of the scale index j 
correspond to coarse scales (low frequencies), we define the family {ipj t k,j € 
Z, k € Z} of translated and dilated functions, ipjk(~t) — 2~^ 2 tfj(2~H — k), 
j 6 Z, k € Z. If <j> and ip are the scaling and wavelet functions associated 
with a multiresolution analysis (see [3]), then {i/jj t k,j £ Z, k G Z} forms an 
orthogonal basis in L 2 (R). A standard choice are the Daubechies wavelets 
(DB-M), which are parameterized by the number of their vanishing moments 
M. The associated scaling and wavelet functions (f> and ip satisfy (W-l)- 
(W-4), where a in (W-2) is a function of M which increases to infinity as M 
tends to infinity (see [3], Theorem 2.10.1). In this work, however, we neither 
assume that the pair {(f), ip} is associated with a multiresolution analysis 
(MRA), nor that the V'j^'s form a Riesz basis. Other possible choices are 
discussed in [14], Section 3. 

The wavelet coefficients of the process X = {X£,£ £ Z} are defined by 

(6) Wj/^ f X(t)1> j>k (t) dt, j > 0, k € Z, 

JR 

where X(t) = f J2kezXk4>{t — k). If (4>,i/j) define an MRA, then Xk is identi- 
fied with the fcth approximation coefficient at scale j = and Wj k are the 
details coefficients at scale j. 

Because translating the functions <fi or ip by an integer amounts to trans- 
lating the sequence {Wj t k,k £ Z} by the same integer for all j, we can sup- 
pose, without loss of generality, that the supports of 4> and tjj are included 
in [— T,0] and [0, T], respectively, for some integer T > 1. Using this con- 
vention, it is easily seen that the wavelet coefficient Wj & depends only on 
the available observations {X±, . . . , X n } when j > and < k < rij, where, 
denoting the integer part of x by [x] , 

(7) rij d = max([2- J '(n — T + 1) — T + 1], 0). 

Suppose that A is a (possibly nonstationary) process with memory pa- 
rameter do and generalized spectral measure v. If M > d$ — 1/2, then A M A 
is stationary and hence, by [14], Proposition 1, the sequence of wavelet co- 
efficients Wj t . is a stationary process and we can define a"j(v)'= Var(Wj ) fc). 
Our estimator takes advantage of the scaling and weak dependence proper- 
ties of the wavelet coefficients, as expressed in the following condition, which 
will be shown to hold in many cases of interest. 



Condition 1 . There exist j3 > and a 2 > such that 

< oo 



(8) sup 2^' 

J>1 



a 



2 2 2d j 
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and 
(9) 



sup sup 

n>l j=l,..., J, 



Var Yl 



#, 



< oo. 



Equation (8) states that, up to the multiplicative constant a 2 , the vari- 
ance is approximated by 2 2d °^ and that the error goes to zero ex- 
ponentially fast as a function of j. It is a direct consequence of the ap- 
proximation of the covariance of the wavelet coefficients established in [14]. 
Equation (9) imposes a bound on the variance of the normalized partial 
sum of the stationary centered sequence {c~ 2 {v)W 2 k } , which, provided that 
npT 2 ^ = O(l), is equivalent to what occurs when these variables are in- 
dependent. We stress that the wavelet coefficients Wj k are, however, not 
independent, nor can they be approximated by independent coefficients; 
see [14]. Establishing (9) requires additional assumptions on the process X 
that go beyond its covariance structure since Wj k is involved; see The- 
orem 1, where this property is established for a general class of linear 
processes. We have isolated relations (8) and (9) because in our semipara- 
metric context, these two relations are sufficient to show that the wavelet 
Whittle estimator converges to do at the optimal rate (see Theorem 3 be- 
low). 

Let us recall some definitions and results from [14] which are used here. 
As noted above, for a given scale j, the process {Wj,k}keZ is covariance 
stationary. It will be called the within-scale process because all the Wj t k, 
fc 6 Z, share the same j. The situation is more complicated when con- 
sidering two different scales j > j' because the two-dimensional sequence 
{[Wj t k,Wj^k] T }kez is n °t stationary, as a consequence of the pyramidal 
wavelet scheme. A convenient way to define a joint spectral density for 
wavelet coefficients is to consider the between-scale process. 

Definition 1. The sequence {[Wj,*, Wj,fc(j - j') T ] T }k&-, where 

Wj ife (j - j') = \Wj>,v-i'k, ■■■■> W j>,2i-i'k+2i-i'-l\ T i 

is called the between-scale process at scales < j' <j. Wj,jt(j — j') is a 
2J - j'-dimensional vector of wavelet coefficients at scale j' . 

Assuming that the generalized spectral measure of X is given by (2) and 
provided that M > do — 1/2, since A. M X is stationary, both the within-scale 
process and the between-scale process are covariance stationary; see [14]. Let 
us consider the case u* S Tt((3, 7, tt), that is, e = n, so that v* admits a density 

/* in the space 1~t(P, 7) as defined in [14] and v admits a density /(A) = f 1 1 — 
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e -iA|-2<2 /*(A). We denote by D Ji o(-;/) the spectral density of the within- 
scale process at scale index j and by Dj J -_ J /(-; /) the cross spectral density 
between {Wj^kez and {Wj,k(j — j')}k<=z for j 1 <j. It will be convenient 
to set u = j — j' . Theorem 1 in [14] states that, under (W-l)-(W-5), for all 
u > 0, there exists C > such that for all A G (— ir, ir) and j > u > 0, 

(10) |D j)tt (A;/) - r(0)D oo , n (A;d )2 2 ^| < C/*(0)2^-^, 
where, for all u > 0, d G (1/2 — a, M] and A G (— 7T, 7r), 

(11) D OCiM (A; d) d = Y, l A + 2Z7r|- M e u (A + 2/vr)^(A + 2l^(2~ u {X + 2/vr)), 

with e„(0 M2-«/2[i, e - a ~\ . . . , e -*(2"-i)2-{]T. 

Remark 1. The condition (W-5) involves an upper and a lower bound. 
The lower bound guarantees that the series defined by the right-hand side 
of (11) omitting the term I = converges uniformly for A G (vr,7r). The up- 
per bound guarantees that the term I = is bounded at A = 0. As a result, 
D 0O)U (A;d) is bounded on A G (tt,tt) and, by (10), so is D J)M (A;/). In partic- 
ular, the wavelet coefficients are short-range dependent. For details, see the 
proof of Theorem 1 in [14]. 

Remark 2. We stress that (10) may no longer hold if we only assume 
v* G 7Y(/3,7,e) with e < it since in this case, no condition is imposed on 
v(d\) for |A| > e and hence Wj . may not have a density for all j. However, 
this difficulty can be circumvented by decomposing v* as 

(12) v* (dX) = f* (A) d\ + v* (dX) , 

where /* has support in [— e,e] and z?*([— e,e]) =0; see the proof of Theo- 
rem 1. 

Here is a simple interpretation of the bound (10). For any d G M, 2 2jfd D 00]M (- 
is the spectral density of the wavelet coefficient of the generalized fractional 
Brownian motion (GFBM) {B^(9)} defined as the Gaussian process in- 
dexed by test functions 9 G 6( d ) = {9:J R |£[ _2<i |0(£)| 2 d£ < oo} with mean 
zero and covariance 

(13) Cov(B (d) (e 1 ),B (d) (e 2 ))= [ |£|- 2rf 0i(£)02(£K. 

When d > 1/2, the condition / \^\~ 2d \0{^)\ 2 d£ < oo requires that 0(£) decays 
sufficiently quickly at the origin and when d < 0, it requires that de- 
creases sufficiently rapidly at infinity. Provided that d G (1/2 — a, M + 1/2), 
the wavelet function ip and its scaled and translated versions ipj^ all belong 
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to 0(d) • Defining the discrete wavelet transform of B^) as Wj^ = f B^) {ipj,k)i3 ^ 
Z, fc 6 Z and W$(u) = [W^ )2 „ fc , . . . , W^fc+^.J, one obtains 

(14) Cov(W$, Wf k , (u)) = 2**" f D 00iU (A;d)e iA ( fc - fc ') dA; 

see [14], Remark 5, for more details. Equation (10) shows that the within- 
and between-scale spectral densities Dj ;U (A; v) of the process X with mem- 
ory parameter d may be approximated by the corresponding densities of 
the wavelet coefficients of the GFBM Buy with an L°°-error bounded by 
O (2(2d -/3)j). 

The approximation (10) is a crucial step for proving that Condition 1 
holds for linear processes. The following theorem is proved in Section 5. 

Theorem 1. Let X be a process having generalized spectral measure 

(2) with d G R and with v* G H(/3,-y,e) such that f*(0) d = dv*/d\\ x=0 > 0, 
where 7 > 0, (3 G (0,2] and e G (0,7r]. Then, under (W-l)-(W-5), the bound 
(8) holds with a 2 = /*(0)K(d ), w/iere 

(15) K(d) d = f°° \£\- 2d \i>(Q\ 2 d£ for any de (1/2- a,M+ 1/2). 

j —00 

Suppose, in addition, that there exist an integer ko < M and a real-valued 
sequence {a/c}fc e z G i? 2 (Z) suc/t i/toi 

(16) (A^fc^Ofc-tZt, fcGZ, 

where {Z t } t& % is a weak white noise process such that ¥.[Z t ] = 0, E[Z t 2 ] = 1, 
E[Z 4 4 ] = E[Zf] < 00 for allteZ and 

'E[Zf]-3, if tl = t 2 = t 3 = U, 
0, otherwise. 



(17) Cum(Z tl , Z t2 , Z t3 ,Z u ) = i 
Then, under (W-l)-(W-5), the bound (9) holds and Condition 1 is satisfied. 



Remark 3. Relation (9) does not hold for every long- memory process 
X, even with arbitrary moment conditions; see [5]. 

Remark 4. Any martingale increment process with constant finite fourth 
moment, as in the assumption A3' considered in [15], satisfies (17). Another 
particular case is given by the following corollary, proved in Section 5. 

The following result specializes Theorem 1 to a Gaussian process X and 
shows that at large scales, the wavelet coefficients of X can be approxi- 
mated by those of a process X whose spectral measure v satisfies the global 
condition v G 7Y(/3,7,7r). 
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Corollary 2. Let X be a Gaussian process having generalized spectral 

measure (2) with do G K and with v* G TC(f3,-y,e) such that /*(0) = du* / 
d\ x=Q >Q, where 7 > 0, (5 G (0, 2] and e G (0,tt]. T/ien, under (W-l)-(W-5), 
Condition 1 is satisfied with a 2 = /*(0)K(do)- 

There exists, moreover, a Gaussian process X defined on the same proba- 
bility space as X with generalized spectral measure u G 7i(/3, 7, it) and wavelet 
coefficients {Wj^} such that 

sup {n^ 1+M °- 2Q ) +n 2 2 2 ^ 1 - 2 ^}- 1 

n>l.j>0 

(18) " " ! 1 " 

-1 rij — 1 

?2 



x E 



fc=0 fc=0 



< 00. 



3. Asymptotic behavior of the local Whittle wavelet estimator. We first 
define the estimator. Let {cj,^, (j, k) G X} be an array of centered indepen- 
dent Gaussian random variables with variance Var(cj ) / C ) = o~ 2 k , where X is 
a finite set. The negative of its log-likelihood is (1/2) ^(j^eii ] k/ a j k + 
log(cr 2 fc )}, up to a constant additive term. Our local Whittle wavelet estima- 
tor (LWWE) uses such a contrast process to estimate the memory parameter 
do by choosing cj^ = Wj^. The scaling and weak dependence in Condition 1 
then suggest the following pseudo negative log-likelihood: 

L x (a 2 ,d) = (l/2) ]T {^ fc /(a 2 2 2 *) + log(a 2 2^)} 

= A E 2^W 2 k + ^\o g {a 2 2 2 ^% 

where \X\ denotes the number of elements of the set X and (X) is defined as 
the average scale, 

(19) (1) = ^ E j- 

Define a 2 (d) = Argmin (T 2 > oLj(a 2 , d) = \X\~ 1 Y,( j ,k)&i 2 ~ 2dj W 2 k . The maxi- 
mum pseudo-likelihood estimator of the memory parameter is then equal to 

the minimum of the negative profile log-likelihood (see [21], page 403), dx = f 
Argmin dGR Lx((T|(d),(i), that is, 

(20) d x = ArgminL z (cZ), where L x (d) d = log ^ 2 2d ^~^W 2 k . 

If X contains at least two different scales, then Lj((f) — > 00 as d — > ±00 
and thus dx is finite. The derivative of Lj(d) vanishes at d = dx, that is, 
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Sj(dj) = 0, where for all deR, 

(21) Sjid)^ J2 [j-{X)]2-^ d Wl h . 

(j,k)ez 

We consider two specific choices for 2. For any integers n, jo and ji, jo < ji, 
the set of all available wavelet coefficients from n observations X\ , . . . , X n 
having scale indices between jo and ji is 

(22) X„(j , ji) d = {(j, k) : jo <j <ji,0 < k < nj }, 

where nj is given in (7). Consider two sequences, {L n } and {U n }, satisfying, 
for all n, 

(23) < L n < U n < J n , J n = f max{j : nj > 1}. 

The index J n is the maximal available scale index for the sample size n; L n 
and U n will denote, respectively, the lower and upper scale indices used in the 
pseudo-likelihood function. The estimator will then be denoted dx n [L n ,u n )- 
As shown below, in the semiparametric framework, the lower scale L n gov- 
erns the rate of convergence of dx n [L n ,u n ) toward the true memory parameter. 
There are two possible settings as far as the upper scale U n is concerned: 

(S-l) U n — L n is fixed, equal to t > 0; 

(S-2) U n < J n for all n and U n — L n — > co as n — > oo. 

(S-l) corresponds to using a fixed number of scales and (S-2) corresponds 
to using a number of scales tending to infinity. We will establish the large 
sample properties of dz n (L n ,U n ) f° r these two cases. 

The following theorem, proved in Section 8, states that under Condition 1, 
the estimator dz„(L n ,u„) is consistent. 

Theorem 3 (Rate of convergence). Assume Condition 1. Let {L n } and 
{Un} be two sequences satisfying (23) and suppose that, as n—> oo, 

(24) L^-^-^ + L-i^O. 

The estimator dx n (L n ,v n ) defined by (20) and (22) is then consistent with a 
rate given by 

(25) d In[LM = d + O r {{nT- L »)- 1 l* + 2"^}. 

By balancing the two terms in the bound (25), we obtain the optimal 
rate. 

COROLLARY 4 (Optimal rate). When n x 2( 1+2/3 ) in , we obtain the rate 

(26) di n{ L n ,u n ) = d + O F (n-^ 1+2 ^). 
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Proof. By taking n x 2 ( - 1+2 ^ Ln , the condition L' 1 + L 2 n (n2~ i ")~ 1 / 4 -> 
is satisfied and (nL n )~ 1//2 X 2 _/3Ln x n -PI( 1 + 2 P) . This is the minimax rate 
[7]. □ 

Remark 5. Observe that the setting of Theorem 3 includes both cases (S-l) 
and (S-2). The difference between these settings will appear when computing 
the limit variance in the Gaussian case; see Theorem 5 below. 

We shall now state a central limit theorem for the estimator dx n (Ln,u n ) °^ 
do, under the additional assumption that X is a Gaussian process. Exten- 
sions to non-Gaussian linear processes will be considered in a future work. 
We denote by | • | the Euclidean norm and define, for all d € (1/2 — a,M] 
and u 6 N, 

(27) l u (d) =i f f |Doo )U (A; d)\ 2 dX = ^vr)" 1 £ Cov 2 (W$,W%), 

J - 7T T&L 

where we have used (14). We denote, for all integer i > 1, 

d f 2 — ^ f 2 — ^ 

(28) Tit = X X _ o-l and = E(j~^) 2 9 _o^ ' 

i=o z z j=o 

V(d °' £) " (2-2-0^(log(2)K(d )) 2 

(29) x{l (d ) + -j2Ud )2 {2d °- 1)u 



x E 9 _ 2 -< (i-r]i)(i + u-r]i) 

i=0 



oo 

(30) V(rf °- C ° ) =' [2 1 o g ( 2 lK(d„)]^ I ° (c '° )+2 g 1 I " ( ''° )2 ' M0 ""}' 

where K(d) is defined in (15). The following theorem is proved in Section 8. 



Theorem 5 (CLT). Let X be a Gaussian process having generalized 
spectral measure (2) with do € R and v* £ 7i(P,j,e) with e, e) > 0, 
where 7 > 0, (3 6 (0,2] and e G (0,7r]. Let {L n } a sequence such that 

(31) L 2 (n2 -L n) -l/4 + n2 -(l + 2/3)L„ ^ 

and {U n } be a sequence such that either (S-l) or (S-2) holds. Then, under 
(W-l)-(W-5), we have, as n— > 00, 

(32) (n2- i ») 1 /2(4 n(ire[/n) -^^^V^o^)], 
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where £ = lim n _ >00 (C/ n - L n ) G {1,2, ... , oo}. 

Remark 6. The condition (31) is similar to (24), but ensures, in addi- 
tion, that the bias in (25) is asymptotically negligible. 

Remark 7. The larger the value of f3, the smaller the size of the allowed 
range for do in (W-5) for a given decay exponent a and number M of 
vanishing moments. Indeed, the range in (W-5) has been chosen so as to 
obtain a bound on the bias which corresponds to the best possible rate under 
the condition u* 6 7Y(/?,7,e). If (W-5) is replaced by the weakest condition 
do £ (1/2 — a,M], which does not depend on f3, the same CLT (32) holds, 
but (3 in condition (31) must be replaced by (3' £ (0,(3]. This (3' must satisfy 
1/2 - a < (l + /3')/2-a <d , that is, < 0' < 2(d + a) - 1. When [3' < (3, 
one gets a slower rate in (32). 

Remark 8. Relation (32) holds under (S-l), where £ < oo and (S-2), 
where I = oo. It follows from (72) and (74) that V(do,£) — > V(<io, oo) < oo 
as t — > oo. Our numerical experiments suggest that in some cases, one may 
have y(do,£) < Y(do,£') with I < £'; see the bottom left panel of Figure 1. 
In that figure, one indeed notices a bending of the curves for large d, which 
is more pronounced for small values of M and may be due to a correlation 
between the wavelet coefficients across scales. 

Remark 9. The most natural choice is U n = J n , which amounts to us- 
ing all the available wavelet coefficients with scale index larger than L n . The 
case (S-l) is nevertheless of interest. In practice, the number of observations 
n is finite and the number of available scales J n — L n can be small. Since, 
when n is finite, it is always possible to interpret the estimator dj n (L ni j n ) 
as dz n {L n ,L n +£) with t = J n — L n , one may approximate the distribution of 
(?i2- L ") 1 / 2 ( ( i Iii(Ln>Jn) - d ) either by M(0,V(d ,£)) or by M(0, V(d , oo)). 
Since the former involves only a single limit, it is likely to provide a better 
approximation for finite n. Another interesting application involves consid- 
ering online estimators of do- online computation of wavelet coefficients is 
easier when the number of scales is fixed; see [19]. 

4. Discussion. The asymptotic variance V(d, £) is defined for all £ £ {1,2, 
. . . , oo} and all 1/2 + a <d<M by (29) and (30). Its expression involves the 
range of scales £ and the L 2 -norm I n (<io) of the asymptotic spectral density 
D oo,u(^id) of the wavelet coefficients, both for the "within" scales (u — 0) 
and the "between" scales (u > 0). The choice of wavelets does not matter 
much, as Figure 1 indicates. One can use Daubechies wavelet or Coiflets 
(for which the scale function also has vanishing moments). What matters 
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is the number of vanishing moments M and the decay exponent a, which 
both determine the frequency resolution of ip. For wavelets derived from a 
multiresolution analysis, M is always known and [3], Remark 2.7.1, page 86, 
provides a sequence of lower bounds tending to a (we used such lower bounds 
for the Coiflets used below). For the Daubechies wavelet with M vanishing 
moments, an analytic formula giving a is available; see [4], equation (7.1.23), 
page 225 and the table on page 226, and note that our a equals the a of [4] 
plus 1. 



4.1. The ideal Shannon wavelet case. The so-called Shannon wavelet i/js 
is such that its Fourier transform tps satisfies |^,g(£)| 2 = 1 for |£| € [tt, 2n] and 
is zero otherwise. This wavelet satisfies (W-2)-(W-4) for arbitrary large M 
and a, but does not have compact support, hence it does not satisfy (W-l). 
We may not, therefore, choose this wavelet in our analysis. It is of interest, 
however, because it gives a rough idea of what happens when a and M are 
large since one can always construct a wavelet tp satisfying (W-l)-(W-4) 
which is arbitrarily close to the Shannon wavelet. Using the Shannon wavelet 
in (11), we get, for all A G (— ir, it), D 00>u (A; d) = for u > 1 and D OGi o(A; d) = 
(2vr - |A|)" M so that, for all deR, (29) becomes 

(33) Y(d, I) = ^g(-4d) where (x) = f 2n xx dX 

This V(d,£) is displayed in Figure 1. 

4.2. Universal lower bound for Io(d). For £ = oo, using the facts that 
Io(d) > for u > 1 and, by the Jensen inequality in (27), Io(gT) > K 2 (ci)/(27r), 
we have, for all 1/2 + a < d < M, 

(34) V(d, oo) > (81og 2 (2))" 1 ~ 0.2602. 

This inequality is sharp when d = and the wavelet family {ipj t k}j,k forms an 
orthonormal basis. This is because, in this case, the lower bound (81og 2 (2)) _1 
in (34) equals V(0, oo). Indeed, by (13) and Parsevals theorem, the wavelet 
coefficients {-B(o)(V ; j,fc)}i,fc are a centered white noise with variance 2ir and, 
by (15) and (27), K(0) = 2vr and I u (0) = 2tt1(u = 0). Then, V(0,^) = (2(2 - 
2~ 1 ~)ki log 2 (2)) _1 . Since ki is increasing with I and tends to 2 as t — > oo (see 
Lemma 13), V(0,£) > (81og 2 (2))~ 1 = V(0, oo). Hence, the lower bound (34) 
is attained at do = if {ipj,k}j,k i s an orthonormal basis. 

4.3. Numerical computations. For a given wavelet ip, we can compute the 
variances V(d,£) numerically for any £ = 1,2, . . . , oo and 1/2 + a < d < M. 
It is easily shown that d *— > V(d, £) is infinitely differentiable on 1/2 + a < 
d < M so that interpolation can be used between two different values of 
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Fig. 1. Numerical computations of the asymptotic variance V(d,£) for the G'oiflets and 
Daubechies wavelets for different values of the number of scales £ = 4,6, 8, 10 and of 
the number of vanishing moments M = 2,4. Top row: Coiflets; bottom row: Daubechies 
wavelets; left column: M = 2; right column: M = 4. The dash-dot lines are the asymptotic 
variances for the Shannon wavelet [see (33)] with £ = 4,6,8,10. For a given i, the vari- 
ances for different orthogonal wavelets coincide at d = 0; see the comment following (34)- 
The right and left columns have different horizontal scales because different values of M 
yield different ranges for d. 

d. We compared numerical values of V(d,£) for four different wavelets, with 
£ = 4, 6, 8, 10, and compared them with the Shannon approximation (33); see 
Figure 1. We used as wavelets two Daubechies wavelets which have M = 2 
and M = 4 vanishing moments, and a = 1.3390 and a = 1.9125 decay ex- 
ponents, respectively, and two so-called Coiflets with the same number of 
vanishing moments, and a > 1.6196 and a > 1.9834 decay exponents respec- 
tively. For a given number M of vanishing moments, the Coiflet has a larger 
support than the Daubechies wavelet, resulting in a better decay exponent. 
The asymptotic variances are different for M = 2, in particular, for negative 
d's, the Coiflet asymptotic variance is closer to that of the Shannon wavelet. 
The asymptotic variances are very close for M = 4. 
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4.4. Comparison with Fourier estimators. Semiparametric Fourier esti- 
mators are based on the periodogram. To allow comparison with Fourier 
estimators, we must first link the normalization factor n2~ Ln with the band- 
width parameter m n (the index of the largest normalized frequency) used by 
semiparametric Fourier estimators. A Fourier estimator with bandwidth m n 
projects the observations [X\ . . . X n ] T on the space generated by the vectors 
{cos(27tA; • /n), sin(27rA; • /n)}, k = 1, . . . , m n , whose dimension is 2m n ; on the 
other hand, the wavelet coefficients {Wj tk ,j > L, k = 0, . . . , rij — 1} used in 
the wavelet estimator correspond to a projection on a space whose dimen- 
sion is at most J2j=L n n j ~ 2n2 _Ln , where the equivalence holds as n — > oo 
and n2~ Ln — > oo, by applying (75) with jo = L n , j\ = J n and p = 1. Hence, 
for m n or n2~ Ln large, it makes sense to consider n2~ Ln as an analog of the 
bandwidth parameter m n . The maximal scale index U n is similarly related 
to the trimming number (the index of the smallest normalized frequency), 
often denoted by l n (see [16]), that is, l n ~ n2~ Un . We stress that, in absence 
of trends, there is no need to trim coarsest scales. 

With the above notation, the assumption (24) in Theorem 3 becomes 
m n /n + (\ogn/m n f'm~ 1 — > and the conclusion (25) is expressed as d = oIq + 
Op(m-n 1//2 + (rnn/n) 13 ). The assumption (31) becomes (logn/m^) 8 ™," 1 + 

m^ +2 ^/n 2 ^ — > and the rate of convergence in (32) is ml/ 2 . 

The most efficient Fourier estimator is the local Whittle (Fourier) estima- 
tor studied in [15]; provided that 

(1) the process {Xk} is stationary and has spectral /(A) = |1 — e~ tX \~ 2do f*(X) 
with do e (-1/2, 1/2) and /*(A) = /*(0) + 0(|A|^) as A -> 0, 

(2) the process {X^} is linear and causal, X^ = Z^'jLo UjZk-j, where {Zk} 

is a martingale increment sequence satisfying E[Z| | Fk-i] = 1 a - s -> E[Z||^ r fe_i] = 

fi 3 a.s. and E[Z|] = E[Zf] , where T k = a(Z k _ h l > 0) and a(A) d =ET=o a k e' lkX 
is differentiable in a neighborhood (0,5) of the origin and \da/dX(X)\ = 
0(|a(A)|/A) as A ^ 0+ (see A2') 

(3) m" 1 + (logm n ) 2 m 1 n + 2 ' 3 /n 2 ? -> (see A4') , 

1/2 A 

then m n (d mn — do) is asymptotically zero- mean Gaussian with variance 
1/4. This asymptotic variance is smaller than (but very close to) our lower 
bound in (34) and comparable to the asymptotic variance obtained numer- 
ically for the Daubechies wavelet with two vanishing moments; see the left- 
hand panel in Figure 1. Also, note that while the asymptotic variance of 
the Fourier estimators is a constant, the asymptotic variances of the wavelet 
estimators depend on do (see Figure 1). In practice, one estimates the limit- 
ing variance \(do,tj by V(d,£) in order to construct asymptotic confidence 
intervals. The continuity of V(-,£) and the consistency of d justify this pro- 
cedure. 
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We would like to stress, however, that the wavelet estimator has some dis- 
tinctive advantages. From a theoretical standpoint, for a given (5, the wavelet 
estimator is rate optimal, that is, for (5 G (0, 2], the rate is n^/ 1+2 ^ (see Corol- 
lary 4) and the CLT is obtained for any rate o(n^/ 1+2/3 ). For the local Whittle 
Fourier estimator, the best rate of convergence is 0((n/ log 2 (n))^/ 1+2 ^) and 
the CLT is obtained for any rate o((n/ log 2 (n))^/ 1+2 ^). This means that 
for any given /3, the wavelet estimator has a faster rate of convergence and 
can therefore yield, for an appropriate admissible choice of the finest scale, 
shorter confidence intervals. Another advantage of the wavelet Whittle esti- 
mator over this estimator is that the optimal rate of convergence is shown 
to hold for v* 6 7i(P,^,e) without any further regularity assumption, such 
as the density /* of v* having to be differentiable in a neighborhood of zero, 
with a given growth of the logarithmic derivative. To the best of our knowl- 
edge, the GPH estimator is the only Fourier estimator which has been shown, 
in a Gaussian context, to achieve the rate 0(n^/( 1+2 ^) (see [7]); its asymp- 
totic variance is 7r 2 /24 ~ 0.4112. It is larger than the lower bound (34) and 
larger than the asymptotic variance obtained by using standard Daubechies 
wavelets with i > 6 on the range (—1/2, 1/2) of do allowed for the GPH es- 
timator (see Figure 1). When pooling frequencies, the asymptotic variance 
of the GPH estimator improves and tends to 1/4 (the local Whittle Fourier 
asymptotic variance) as the number of pooled frequencies tends to infinity; 
see [16]. 

Thus far, we have compared our local Whittle wavelet estimator with the 
local Whittle Fourier (LWF) and GPH estimators in the context of a sta- 
tionary and invertible process X, that is, for do € (—1/2,1/2). As already 
mentioned, the wavelet estimators can be used for arbitrarily large ranges 
of the parameter do by appropriately choosing the wavelet so that (W-5) 
holds. There are two main ways of adapting the LWF estimator to larger 
ranges of d: differentiating and tapering the data (see [22]) or, as promoted 
by [20], modifying the local Whittle likelihood, yielding the so-called exact 
local Whittle Fourier (ELWF) estimator. The theoretical analysis of these 
methods is performed under the same set of assumptions as in [15], so the 
same comments on the nonoptimality of the rate and on the restriction on 
/* apply. Also, note that the model considered by [20] for X differs from 
the model of integrated processes defined by (16) and is not time-shift in- 
variant; see their equation (1). In addition, their estimator is not invariant 
under the addition of a constant in the data, a drawback which is not easily 
dealt with; see their Remark 2. The asymptotic variance of the ELWF esti- 
mator has been shown to be 1/4, the same as the LFW estimator, provided 
that the range (Ai, A2) for do is of width A2 — Ai < 9/2. The asymptotic 
variance of our local Whittle wavelet estimator with eight scales, using the 
Daubechies wavelet with M = 4 zero moments, is at most 0.6 on a range 
of same width; see the left-hand panel in Figure 1. Again, this comparison 
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does not take into account the logarithmic factor in the rate of convergence 
imposed by the conditions on the bandwidth m n . Concerning the asymp- 
totic variances of tapered Fourier estimators, increasing the allowed range 
for do means increasing the taper order (see [8] and [17]), which, as already 
explained, inflates the asymptotic variance of the estimates. In contrast, for 
the wavelet methods, by increasing the number of vanishing moments M 
of, say, a Daubechies wavelet, the allowed range for do is arbitrarily large 
while the asymptotic variance converges to the ideal Shannon wavelet case, 
derived in (33); the numerical values are displayed in Figure 1 for different 
values of the number of scales I. The figure shows that larger values of t tend 
to yield a smaller asymptotic variance. One should thus choose the largest 
possible M and the maximal number of scales. This prescription cannot be 
applied to a small sample because increasing the support of the wavelet de- 
creases the number of available scales. The Daubechies wavelets with M = 2 
to M = 4 are commonly used in practice. 

From a practical standpoint, the wavelet estimator is computationally 
more efficient than the aforementioned Fourier estimators. Using the fast 
pyramidal algorithm, the wavelet transform coefficients are computed in 
0(n) operations. The function d \— > Lj(eZ) can be minimized using the New- 
ton algorithm [2], Chapter 9.5, whose convergence is guaranteed because 
hj(d) is convex in d. The complexity of the minimization procedure is re- 
lated to the computational cost of evaluation of the function Lj and its two 
first derivatives. Assume that these functions need to be evaluated at p dis- 
tinct values d\ , . . . , d p . We first compute the empirical variance of the wavelet 
coefficients nj l Y^ILq 1 f° r the sca l es 3 € {L n , . . . , U n }, which does not 
depend on d and requires 0(n) operations. For X = X n (L n , U n ), Lj and all 
of its derivatives are linear combinations of these U n — L n + 1 = 0(log(n)) 
empirical variances with weights depending on d. The total complexity for 
computing the wavelet Whittle estimator in an algorithm involving p it- 
erations is thus 0(n + plog(n)). The local Whittle Fourier (LWF) con- 
trast being convex, the same Newton algorithm converges, but the com- 
plexity is slightly higher. The computation of the Fourier coefficients re- 
quires 0(nlog(n)) operations. The number of terms in the LWF contrast 
function (see [15], page 1633) is of order m n [which is typically of order 
0(n 7 ), where 7 £ (0, 1/1 + 2/3)], so the evaluation of the LWF contrast func- 
tion (and its derivatives) for p distinct values of the memory parameter 
dx,...,dp requires 0(pm n ) operations. The overall complexity of computing 
the LWF estimator in a Newton algorithm involving p steps is therefore 
0(ralog(ra) +pm n ). Differentiating and tapering the data only adds 0(n) 
operations, so the same complexity applies in this case. The ELWF estima- 
tor is much more computationally demanding and is impractical for large 
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data sets: for each value of the memory coefficient d at which the pseudo- 
likelihood function is evaluated, the algorithm calls for the fractional inte- 
gration or differentiation of the observations, namely, (A d X)k, k = 1, . . . , n, 
and the computation of the Fourier transform of {(A d X)i, . . . ,(A d X) n }. 

In this context, {A d X)k = f J2i=o Xk-i, k = 1, . . . , n, where (cc)o = 1 and 
(x)k = x(x + 1) • ■ • (x + k — 1) for k > 1 denote the Pochhammer symbols. 
The complexity of this procedure is thus 0(n 2 + nlog(n)). The complexity 
for p function evaluations, therefore, is 0(p(n 2 + nlog(n))). The convexity 
of the criterion is not assured, so a minimization algorithm can possibly 
be trapped in a local minimum. These drawbacks make the ELWF estima- 
tor impractical for large data sets, say of size 10 6 — 10 7 , as encountered in 
teletraffic analysis or high-frequency financial data. 

5. Condition 1 holds for linear and Gaussian processes. 

Proof of Theorem 1. For any scale index j G N, define by {hjj}i G % 

the sequence h jjt d = 2~^ 2 <j>(t + l)$(2-H) dt and by Hj(X) d = £/ e z h^e'^ 
its associated discrete-time Fourier transform. Since 4> and ip are compactly 
supported, {hjj} has a finite number of nonzero coefficients. As shown 
by [14], Relation 13, for any sequence {xi}i e z, the discrete wavelet trans- 
form coefficients at scale j are given by W? k = J2i&z x if l j,2^k~i- I n addition, 
it follows from [14], Relation 16, that Hj(X) = (1 - e~ a ) M -Hy(A), where 
Hj(X) is a trigonometric polynomial, that is, Hj(X) =J2iez^j,ie~ lXl , where 
{hjj} has a finite number of nonzero coefficients. 

Define V and v as the restrictions of v on [—£, e] and on its complementary 
set, respectively. These definitions imply that 

(35) o)(v)=o){v)+o){v). 

Since v* 6 T~i((3, 7, e), the corresponding decomposition for v* reads as in (12), 
so V admits a density /(A) = |1 — e _lA |~ 2d °/*(A) on A G [— n,ir], where 
f(A) = for A i [-£,e] and |/*(A) - /*(0)| < 7/*(0)|A| /3 on A G [-e,e\. 
Hence, (10) holds: by [14], Theorem 1, there exists a constant C such that 
for all j > and A G (— vr,7r), 

(36) |D i>u (A;/) - r(0)D OOiU (A;d )2 2 ^| < Cf* (O)^^"^'. 

Recall that D Ji o(A;/) is the spectral density of a stationary series with 
variance cr|(z7) = D Ji o(A; /) dX. Similarly, by (14) and (15), D 00j o(A;<io) 
is the spectral density of a stationary series with variance K(do). Thus, after 
integration on A G (— tt, it), (36) with u = yields 

(37) \a]{u) - f*(0)K(d )2 2jdo \ < 2TrCf*(0)j2^ d °-^ j . 
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By [14], Proposition 9, there exists a constant C such that \Hj(\)\ < C2^ M+1 / 2 "> x 
|A| M (1 + 2i\\\)- a - M for any A e [-vr, +vr], which implies that 

a)(y) = 2 1" \H 3 (\)\ 2 v(d\) < C2 i - l+2M ^ f A 2M (1 + V \y 2a ~ 2M v(d\) 

(38) < Ctt 2M 2( 1+2M ^(1 + ey)- 2a - 2M v([e,ir]) 

= 0(2 j( - 1 ~ 2a) ) = o(2 j{2d °-> 3) ), 

since, by (W-5), 1 - 2a - 2d + P < 0. Relations (35), (37) and (38) prove 
(8). 

We now consider (9). We have, for all j > and n > 1, (see [18], Theo- 
rem 2, page 34), 

(rij — 1 \ n j — 1 

E^'fc = E K-|r|)Cov(^ 2 ,^ 2 r ) 
fc=0 / T=-nj+l 

rij — l 

(39) = J2 K--|r|)[2Cov 2 (^- ,^, r ) 

+ Cum(W / ,o,W ? -,o,VF i ,r,W ? >)]- 
Using (16), since M > ko, we may write 

(40) W hk = £ / l , i2Jfc _,(A M X) 4 = £ hvk-tZt, 

where = h jr * (A M - fe °a) belongs to £ 2 (Z). By (17), we thus obtain 
Cum(Wj, o, ^ o, W,-, T , W,- , T ) = (E[Z?] - 3) £ h\ t b]^ T _ t , 

which, in turns, implies that 

El^(^,o,Wi.o J ^,r,W J >)| = |E[^]-3| £ ^ |2JT _ t 

rgZ t.reZ 

(41) 

<|E[^]-3|o-» 

since, by (40), £t^t = ^ 2 W- 

We shall now bound Y^r^-nj+i Cov 2 (Wj t o, Wj tT ). One can define uncorre- 
cted wavelet coefficients {Wj.fc} and associated with the generalized 
spectral measures V and u, respectively and such that Wj t k = Wj,fe + Wj^k 
for all j > and k E Z. Therefore, Cov 2 (W r J - , W j;T ) = Cov 2 (W ji0 , W j>Q ) + 
Cov 2 (W j>0 ,W j;T ) + 2Cov(Wj >0 ,W j>T )Cov(W j ^ By (8), <r|(i/)x2». 
Therefore, by (36) and using [14], Proposition 3, equation (30), for all 
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j > 0, {(jJ l (i>)W j±,k £ Z} is a stationary process whose spectral density 
is bounded above by a constant independent of j. Parsevals theorem implies 
that sup^! &J A {u) X) T £Z Cov 2 (Wjfi, Wj )T ) < oo, hence 



(42) sup sup n.V. 4 (i/) (nj-\T\)Cov 2 (W jiQ ,W jtT )<oo. 

n>lj=l,...,J n r=-nj+l 

Now, consider {W^fc}. The Cauchy-Schwarz inequality and the stationar- 
ity of the within-scale process imply that Cov 2 (Wjfi, Wj tT ) < Var (Wj,o) = 
aj(y) = 0(2 2 ^ 1 - 2q )), by (38), and since o)(v) x 2^'*, we get 

(43) sup sup 2 4 V (n i -|r|)Cov 2 (iy 7 - ,W / i,r)<oo. 

n>ii=i,...,j„ "j-o-j \y) T= _ n . +1 

Finally, using the fact that, for any j > 1, D J) o(A;/) is the spectral density 
of the process {Wj ; fe} and denoting by Pj the spectral measure of {Wj^}k£Z, 
it is straightforward to show that 



A(n,j) = ( n i " M) CaviWj, , W jlT ) Covfe, W; 



D i ,o(A';/) 



E 

k=0 



< 27rn J a 2 (z>)||D ii0 (-;/)||oo. 
This implies that A(n,j) > and using (38), (36) and o](y) x 2 2jd °, we get 

2 j(2a+2d -i) 

(44) sup sup . \A(n,j)\ < oo. 

n>lj=l,...,J„ WjO-j-Cz/) 

Using the fact that Wj t k = j,k + Wj,fc and and Wj t k are uncorre- 

lated, (39), (41), (42), (43), (44) and 1 - 2a - 2d <-/3<6 yield (9). □ 

Remark 10. If e = tt in the assumptions of Theorem 1, then, in the 
above proof, Wj^ = for all (j, k), so not only (9) holds, but also the stronger 
relation 

, x -i f^ 1 w h \ 

(45) sup sup n- Var } 9 ' - < oo. 

»>ii=i,.,j„ 3 \fo a H u V 

Proof of Corollary 2. Condition 1 holds because Theorem 1 applies 
to a Gaussian process. Moreover, since its fourth order cumulants are zero, 
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the relation Wf >k = W j>k + W? k + 2W jjk W j:k , (43) and (44) yield 



Var J2( W lk-Wl k ))<C 



. k=0 



22j(2a+2d -l) ' 2i( 2a + M o-l) 



n)a]{v) 



+ 



where C is a positive constant. Since Wj >k and Wj >k are uncorrelated, E[W 2 fc — 
W\ k ] = a](D), hence the last display, a){v) x 2 2 i d ° and (38) yield (18). □ 

6. Asymptotic behavior of the contrast process. We decompose the con- 
trast (20) into a sum of a (deterministic) function of d and a random process 
indexed by d, 

(46) Li(d) d ^ f Lj(d) + E x (d) +log(|Z| C T 2 2 2do ^), 

where the log term does not depend on d (and thus may be discarded) and 



(47) Ll (^ f lc.g(-L £ 2 2 W^)A_ _L ^ log(2 2 ^ 



(48) Ej(d) d = log 



1+ E 



22(d -d)j 



(i,fc)ex 
"ft 



-mi./ 



with cr 2 defined in (8). 



Proposition 6. For any finite and nonempty set IcNxZ, t/ie func- 
tion d — » Lj(d) is nonnegative, convex and vanishes at d = do. Moreover, for 
any sequence {L n } such that n2~ Ln — > oo as n — > oo, and /or any constants 
d min and d max in R satisfying do- 1/2 < d min < d max , 



(49) 



liminf inf inf L T /r ,-\(<i)>0, 

n^OO dG[d min ,d m ax]il=in+l,..,J„ 2 ™^Wl)V 



where X n is defined in (22) and Lj denotes the second derivative ofLj. 

Proof. By concavity of the log function, Lj(d) > and is zero if d = do. 
If I = I n (L n ,ji) with j\ > L n + 1, one can compute Lj(d) and show that 
it can be expressed as Lj(d) = (21og(2)) 2 Var(iV), where N is an integer- 



n ; 



valued random variable such that P(7V = j) = 2 2( - d °- d ^n j / Y?j=L n 22(d °" 
for j > 0. Let d > d m \ n > do — 1/2. Then, 

F(N = L n ) > (1 - 2 2 ( do ^ d ™)^ 1 ){l - T2 L "(n - T + l)" 1 }. 

Since n2~ Ln — > oo, the term between the brackets tends to 1 as n — ► oo. 
Hence, for n large enough, we have infd>d min P(iV = L n ) > (1 — 2 2 ( d ° -dmin ) _1 )/2. 
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Similarly, one finds, for n large enough, inf de[(imini(W ] F(N = L n + 1) > 
(1 _ 2 2(do-dmin)-i)2 2 ( d o-<W)-i / / 2 . Hence, 

inf Var(iV) > {L n - E(N)} 2 F(N = L n ) 



de[d 



mini "-max J 



+ {L n + 1 - E(iV)} 2 P(iV = L n + l) 

> (1 — 2 2 ^°" dmin ''™ 1 )2 2 *- Gf ° ~ rf max) — 2 

x ({L n - E(N)} 2 + {L n + 1 - E(iV)} 2 ) 

> (1 — 2 2 ^ do ~ dinin ^~ 1 )2 2 ^ do ~ dina -^~ 4: 

where the last inequality is obtained by observing that either E(iV) — L n > 
1/2 or L n + l-E(JV) <l/2. □ 

We now show that the random component Ex(d) of the contrast (46) 
tends to uniformly in d. For all p > 0, g > and 6 6 R, define the set of 
real-valued sequences 

(50) ={{Mi>i>o : N < P(l +i 9 )2 J ' 5 for all j > 0}. 

Define, for any n > 1, any sequence /x = f {^j}j>o and < jo < j\ < J n , 



(51) 



^«joJi(/ x ) — Mj'-j'o 



«i-i r VF 2 



a 



2 2 2d j 



Proposition 7. Under Condition 1, for any q > and 5 < 1, there 
exists C > s-uc/t i/iai /or all p>0, n > 1 and jo = 1, ... , J n , 

E sup sup |S n ,ioji(M)| 2 f 
neB(p,q,S) ji=jo,—,Jn ) 

(52) < Cpn2- J0 [^ i5 (n2- J0 ) + 2"*°], 



_"V2, tf*<l/2, 

w/iere, for allx>0, H Q)8 {x) = { log 9+1 (2 + x)x~ l l 2 , if 5 = 1/2, 

\og q {2 + x)x 6 ' 1 , if 5 > 1/2. 



Proof. We set p = 1 without loss of generality. We write 



3=30 



k=0 



31 



+ n iMi- 



J=J0 



a 2 2 2d j 



and denote the two terms of the right-hand side of this equality as S^ Q ^ (/x) 
and s£] 0J .>), respectively. By (8), C x =' su Pj > 2^\a 2 {v) / {a 2 2 2d ^) - 1| < 
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oo, which implies sup >0 \a 2 {u)/{a 2 2 2d ^)\ < 1 + C X . Hence, if n € B(l,q,5), 



then 



sS OJ . 1 (/x)|<(l + C 1 )^(l + (i-jo) 9 )2 



<? N l9(i-.7o)<5 



3=30 



E 



k=0 x 3 



Using the Minkowski inequality and rij <n2 3 , (9) implies that there exists 
a constant C2 such that 



E 



1/2 



(53) 



sup sup |Sj} 0jjl (fi)| J 
/xGB(l,g,<5)ii=jov,^n 

< (1 + d)C 2 £ (1 + 0' " Jo) 9 )2 (j - Jo)5 [(n2^) 1 / 2 + n2~^^]. 

3=30 



The sum over the first term is 0{n2 30 H q ^{n2 J0 )) since J n — jo X log 2 n + 
log 2 2~i° = log 2 (n2 _ - 5f °). The sum over the second term is 0(n2~( 1+/3 ^ ) since 
5 < 1 and 1 + > 1, so (53) is 0((n2-i°){H q>d (n2-i°) + 2"*' }) since 2 J " x n. 
Now, by the definition of C\ above and since rij < n2~ J , we get 

sup sup |S« oJ .>)| < Cm £ (1 + (J - j y)2^ s 2-^\ 
jueB(l,g,<5) ji=jo,—,Jn j = j 

which is O(n2-( 1+/3 ) JO ). The two last displays yield (52). □ 

Corollary 8. Let {L n } be a sequence such that L~ l + (n2~ Ln )~ 1 — > 
asn->oo and Zei Ej(d) 6e defined as in (48). Condition 1 then implies that 
as n — > 00 : 

(a) /or any £ > 0, 

sup|E Jn{Lniin+ ,)(d)| = P ((n2- L 'r 1/2 + 2-^); 

(b) for all d m i n > do — 1/2, setting 5 = 2(do — d m in)j 

sup _sup \E Jn(LnJl) (d)\ = ¥ (Ho,s(n2- L ")+2-^). 

Proof. The definitions (48) and (51) imply that, for < j < j\ < J n , 
E T n (3o,h)( d )= l °s[l + (n2-^)- 1 S njo , il [ M (d,io,Ji)]] 
with (j,(d,j ,ji) is the sequence {fij(d,j , ji)}j>0 defined by 

(54) vidM^rt-* 22{do _ d)rn . ^<n-Jo). 

^3' =30 1 
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The bounds (a) and (b) then follow from Proposition 7, the Markov inequal- 
ity and the following bounds. 

Part (a). In this case, we apply Proposition 7 with 5 = 0. Indeed, using 
the fact that fij(d, L n , L n + £)nL n+ j < n2~ Ln for all j = 0, . . . , I and is zero 
otherwise, we have that fij < n2~ n /riL n +e —> 2 e as n — > oo, since n2~ Ln — > 
oo. Then, for large enough n, fi(d, L n , L n + £) E B(2 i+1 ,0,0) for all deR. 

Part (b). Here, we still apply Proposition 7, but with 5 = 2(do — d m \ a ) < 1, 
implying that Ho^(n2~ Ln ) — > 0. Indeed, since the denominator of the ratio 
appearing in (54) is at least 2 2 ^ d °-^ Ln n Ln , we have sup^^ sup d > dmin \Hj(d, 
L n ,ji)\ < n2~ Ln n^ 2^ . Since n2~ Ln ~ n^ n as n — > oo, we get that, for large 
enough n, L n ,j±) £ £>(2, 0, 5) for all d > d m i n and ji > L n . □ 

7. Weak consistency. We now establish a preliminary result on the con- 
sistency of d. It does not provide an optimal rate, but it will be used in the 
proof of Theorem 3, which provides the optimal rate. By the definition of d 
and (46), we have 

(55) > L x (4) - Lx(do) = M4) + Ei(dx) - E x (d ). 

The basic idea for proving consistency is to show that (1) the function d>— > 
L(d) behaves as (d — do) 2 up to a multiplicative positive constant and (2) the 
function d i— > E(<i) tends to zero in probability, uniformly in d. Proposition 6 
will prove (1) and Corollary 8 will yield (2). 

Proposition 9 (Weak consistency). Let {L n } be a sequence such that 
L" 1 + (n2- L ™)- 1 ^0 as n — > oo. Condition 1 implies that as n — > oo, 

(56) sup \d In{LnM -d \ = Op{(n2" L ")- 1 /4 + 2 ~PL n /2 } 

ji=L n +l,...,J n 

Proof. The proof proceeds in four steps. 

Step 1. For any positive integer I, |dr ?l (L n ,L n +f) -<k\ = op(l). 
Step 2. There exists d m i n G (do — 1/2, do) such that, as n — > oo, 

H, i?J } d In{Ln , n) <d min \^0. 

Ul=L„+2,...,J„ J 

Combining this with Step 1 yields P{inf J - 1=Ln+ i v .. i Jn d^L^h) < d mm} 
-> 0. 

Step 3. For any d m£kX > d , as n -> oo, P{sup jl=Ln+li . Jn dr n (i„ji) > d max} -> 
0. 

S'tep 4. Define i?o,<5 as in Proposition 7. For all d m i n € (do — 1/2, do) and 
dmax > do, setting 5 = 2(do — d m in), we have 

SU P [ 1 [d min ,<imax](^x n (L n j 1 ))(^r n (L n j 1 ) - do) 2 ] 

ji=L n +l,...,J„ 

= O F (H ,s(n2- L ") + 2-^). 
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Before proving these four steps, let us briefly explain how they yield (56). 
First, observe that they imply that sup jl=Ln+1 ^ ^ Jn \d Xn {L n ,h) - d \ = o P (l). 
Then, applying Step 4 again with d m i n € (do — 1/4, do), so that Hq : $(x) = 
x~ l l 2 , we obtain (56). □ 

Proof of Step 1. Using standard arguments for contrast estimation 
(similar to those detailed is Step 3 and Step 4 below), this step is a direct 
consequence of Proposition 6 and Corollary 8(a). □ 

Proof of Step 2. Using (20), we have, for all deR, 

For some d m \ n G (do — 1/2, do) to be specified later, we set 

(57) wij(d) = 2 2 (i-W (d °- d h{j < (2)} + 2 2 0'-< I »( d °- d mi n) 1 r J - > ( T )} ; 

so that for all j and d < d min , wxjCO < 2 2{ - d ~ d °^~ : >\ We further obtain, 
for all d < dmin , 

(58) Md)-L j( d )>log ^ ( f 

1 + -DX 

where d = IT]" 1 J2{j,k)& w i,j(d), M(d) d = IT]" 1 T,(j,k)el w l,j( d ) x 

- 1) and B x = 111' 1 £ (j . fc)gJ (-^M_ - 1). We will show that d min G 
(do — 1/2, do) may be chosen in such a way that 

(59) liminf inf inf E T <t ~,\(d) > 1, 



( 6 °) SU P SU P I^J„(L„ji) ( d ) I + l S J„(Wi)l =0P(1)- 

ji=L 7 i+2,...,J„ Vd<d min / 

By (55), Lj(dj) <Lj(d ). Then, inf J1=ln+2 ,...,j„ d In(Ln < d min would im- 
ply that there exists j x = L n + 2, . . . , J n such that inf d < dmin L Jn ( Ln ^ (d) - 

L((io) < 0, an event whose probability tends to zero as a consequence of (58)- 

(60) . Hence, these equations yield Step 2. It thus remains to show that (59) 
and (60) hold. By Lemma 13, since n2~ Ln — ► oo, we have, for n large enough, 

(61) sup (J n (L„,ji)) < L n + 1. 

ji=L„,...,J n 

Using Wi n (L n ,ji),L n (d) > and, for n large enough, w In{LnJl) j(d) > 
2 20-(L„+i))(d -rf m i„) i f or j>L n + 1, we get, for all d < d min < d and ji = 
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Jn 2, • • • , Jn j 

2 -2(X n +l)(do-dmin) L «+ 2 

Since n2~ Ln — ► oo, using Lemma 13, n x 2 Jn and the fact that 2(d — d min ) — 
1 < 0, straightforward computations give that the LHS in the previous dis- 
play is asymptotically equivalent to (i_2{ 2 ( d o-rfmm)-i}2)/(4_ 2 2(do-dmm)+i). 

There are values of d m j n G (do — 1/2, do) such that this ratio is strictly larger 
than 1. For such a choice and for n large enough, (59) holds. 

We now check (60). Observing that, for X n = I n (L n ,j\) and using the no- 
tation (51), A Xn (d) = \In\~ 1 \S n ,L n J 1 ({w^,L n +j(d)})\ and B In = \ln\~ l Sn,L n , 3 

the bound (60) follows from \Z n \ >n^ n ~ n2 n and Proposition 7 since, for 
all d < d min and j > 0, wj n)Ln+j (d) < 2 2 ( L ™ + i~( I ^)( d °- d ™^ < 2 2 ^o-<*m fa ) ) 
which shows that {wz n ,L n +j{d)}j>Q belongs to 13(1, 0,6) with 5 = 2(do — 
dmin)) < 1. □ 

Proof of Step 3. By (55), Lx(dr) < Ej(d ) - Ej(dj), so, for any 
dmax > d , one has inf d > dmax Lj(d) < 2sup d > dn |Ej(d)| on the event {dj > 
dmax}- By Proposition 6, there exists c > such that, for n large enough, 
L J„(L n ji)(rf) > c uniformly for d > d max and jt = L n + 1, . . . , J n . Thus, for n 
large enough, 

W sup d In{LnM > d max 1 < Pi 2 sup sup |E Xn(Lniil) (d)| > cl 

ljl=In+l,...,Jn > y d>dojl=L n + l,...,J n ) 

which tends to as n — > oo, by Corollary 8(b). □ 

Proof of Step 4. Equation (55) implies that l[d mini d max ](dj)Li(dj) < 
2sup^>^ min |Ej(d)|. Let c denote the liminf in the left-hand side of (49) 
when d m in = d m in and d max = d max . Proposition 6 and a second order Tay- 
lor expansion of Lj around do give that, for n large enough, for all ji = 
L n + l,...,J n and d G [d min , d max ], L Xn(LnJl) (d) > (c/4)(d - d ) 2 . Hence, for 
n large enough, 

2 8 

SU P ^[^^(^(injljli^tinjl)-*) ] ^ T SU P l E Xn(L„,il)( rf )|- 

jl=L„ + l,...,J n t- d>d m i n 

Corollary 8(b) then yields Step 4. □ 

Remark 11. Proposition 9 implies that if L n < U n < J n with L~ l + 
(n2~ Ln ) _1 — > as n — > oo, then dx n [L n ,u n ) is a consistent estimator of do- 
While the rate provided by (56) is not optimal, it will be used to derive the 
optimal rates of convergence (Theorem 3). 
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8. Proofs of Theorems 3 and 5. 



Notational convention. In the following, {L n } and {U n } are two 
sequences satisfying (23). The only difference between the two following 
settings (S-l) (where U n — L n is fixed) and (S-2) (where U n — L n — > oo) lies 
in the computations of the asymptotic variances in Theorem 5 (CLT). Hence, 
we shall hereafter write L, U, I n , d n , S n and S n for L n , U n , T n (L n ,U n ), 

di n (L n ,Un), Sj n (L n ,c/ n ) and S n>LniUn , respectively. 

We will use the explicit notation when the distinction between these two 
cases (S-l) and (S-2) is necessary, namely, when computing the limiting 
variances in the proof of Theorem 5. 

Proof of Theorem 3. Since S n (d n ) = [see (21)] a Taylor expansion 
of S n around d = d n yields 

(62) S n (d ) = 21og(2)(4-do) E U ~ Pn))&~ 2ill Wj,k 

for some d n between do and d n . The proof of Theorem 3 now consists of 
bounding S n (do) from above and showing that J2i n {j ~ (^n))j'2 -2 ' Wj k , 
appropriately normalized, has a strictly positive limit. 

By the definitions of S n [see (21)], S n [see (51)] and (T n ) [see (19)], we 
have S n (do) = S n (a 2 {j + L — (l n )}j>o)- Since L < (Z n ) < L + 1 for n large 
enough [see (61)] the sequence a 2 {j + L— {T n )}j>o belongs to £>(<r 2 ,l,0) 
[see (50)], and Proposition 7, together with the Markov inequality, yields, 
as n — ► oo, 

S n (d ) = n2- L P (H lfi (n2- L ) + 2^ L ) 

(63) 

= n2- i O P ((n2- i )- 1 / 2 + 2~ pL ), 

which is the desired upper bound. 

We shall now show that the sum in (62) multiplied by n2~ L has a strictly 
positive lower bound. Applying Proposition 9, we have 

\dn ~ do I < K -d \ = P ((n2- L )~ 1 / 4 + 2^ L / 2 ). 

Using the fact that |2 2 J^o-d») _j| < 2%f|<fe-*»l _ 1 < 21og(2)j|d - d n \2 2 ^ do -^ 
we have that, on the event {\do — d n \ < 1/4}, 

W 2 h W 2 h 



o2d n j ^ w v n " J 2 2 <kj 

w 2 

<2log(2)\d -d n \2 2L ^- d ^ Yl li-(^)U 2 ^2^)/ 2 . 
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Using (8), (61), j 2 = (j - L) 2 + 2(j - L)L + I? and nj < n2~i , there is a 
constant C > such that 

W 2 

E E \j-{ln)\f^-^ 

(j,fe)ex n 

< Cn2" L E b' " (2"n)|i 2 2-( J - L )/ 2 = 0(L 2 n2~ L ). 
Hence, since L 2 (n2~ L )~ 1 / 4 — > 0, the last three displays yield, as n — > oo, 



(64) 
We now write 



W 2 , W 2 , 
E (j-Cn))j-£l- E U-Cn))j 



o2d n j ^ W V n '^2^d j 

(j,k)eln Z a,fe)6Zn 



op(n2" 



E C? ~~ ^^^2^ 

(j,k)€I n 

= - 2 E (i^n>);(^-iW E (i-<W 

(j,fe)eXn v 7 (j,fc)ex n 

With the notation (51), the first term on the right-hand side is S n (n), where 
/j, is the sequence er 2 {(j + L — (l n ))(j + L)}j>o- In view of (61), (j + L — 
(2 n ))(j + L) < j 2 +jL, so the sequence /x is the sum of two sequences belong- 
ing to B(a 2 , 2,0) and B(a 2 L, 1,0), respectively. Applying Proposition 7 to- 
gether with the Markov inequality, we get that our S n (/z) = n2~ L '0^(Hofi(n2~ 1 
L J ffi i0 (n2~ L )) = n2~ L op(l) since L(n2~ L )~ 1 / 2 -> 0. Moreover, by Lemma 13, 
E<j'k)eXnU - Pn))j ~ (n2" L )(2 - 2-( c/ - L )) Kc7 _ L as n ^ oo. Hence, 



W 2 , 



and (64) and the previous display yield 

W 2 , 

(65) 

= (n2- L ){a 2 (2 - 2^ u ~ l ^ku_ l + o P (l)}. 

Since Kg > for all £ > 1 and K£^2 as ^ — ► oo (see Lemma 13), and since 
we assumed U — L > 1, the sequence (2 — 2~( u ~ l ^)ku_l is bounded below 
by a positive constant, so (62), (63) and (65) imply (25). □ 
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Proof of Theorem 5. Define /*(0) = f du*/d\\x=o- Since u* G H(/3,7,e) 
and v*{—e,e) > 0, we have /*(0) > 0. Without loss of generality, we set 
/*(0) = 1. By Corollary 2, conditions (8) and (9) hold with a 2 = K(d ). 
Moreover, (31) implies that L _1 + L 2 (n2 _L ) -1 / 4 — > 0, so we may apply (65), 
which, with (62), gives 

(66) (n2~V /2 (4 - d ) = - ( { ^y l '^ { f^ (1 + <*(!))■ 

21og(2)cH(2- 2 ( u l >)ku- L 

Define S n as S n in (21), but with the wavelet coefficients Wj^ defined in 
Corollary 2 replacing the wavelet coefficients Wj t k- Let us write 

(67) S n (d ) = (S n (d )-S n (d ))+E f [S n (d )} + (S n (do) -E/ISnCdo)]). 

By Corollary 2, using Minkowski's and Markov's inequalities, (61), rij < n2~ 3 
and do + a > (1 + 0)/2, we obtain, as oo, 

S n (d )-S n (do) = o F ((n2~ L ) 1 / 2 ). 

Since E(j,fc) G x n (j - @n)) = and = a 2 (v), we may write 

K f [S n (do)}= £ (j-(In))(2- 2 ^a](u)-a 2 ) 

(j,k)<EZn 

= 0(n2^ 1+ ^ L ) = o((n2~ L ) 1 / 2 ), 

where the O-term follows from (8), (61) and n,- < n2~ 3 and the o-term 
follows from (31). Using (66), (67) and the two last displays, we finally get 
that 

(n2- L ) 1 ' 2 (d n - *) _ (^^)-'/ 2 (S„( rf o) - E f [S„( rf o)]) 

v ; v ; 21og(2)o- 2 (2-2-( l/ - i ))K C/ „ L v v " 

Because /(A) = 1 1 — e - iA | - 2d ° [/* 31 [_ e , e] ] (A) and /*l[_ e , e ] G H(P,1 for some 
7' > 0, we may apply Proposition 10 below to determine the asymptotic be- 
havior of S n (do) — E/[S n (do)] as 00. Since a 2 = /*(0)K(do) (Theorem 1), 
this yields the result and completes the proof. □ 

The following proposition provides a CLT when the condition on v* is 
global, namely v* £ H((3, 7, ir). It covers the cases (S-l), where U — L — > t < 
00 and (S-2), where U — L — ► 00. 



Proposition 10. Let X be a Gaussian process having generalized spec- 
tral measure (2) with d Gf and v* G H((3,"f,ir), with /*(0) = f dv* /d\\ x=0 > 
0, where 7 > and (3 G (0,2]. Lei L and U be two sequences satisfying (23) 
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and suppose that L~ l + (n2~ L )~ 1 — > and U — L — > ^ € {1, 2, . . . , oo} as 
n^oo. Then, asn— > oo, 

21og(2)/*(0)K(d )(2-2 ^ l >)k v ^ l 

where K\. is defined in (28) and V(do,£) in (29) for I < oo and V(do,oo) 
in (30). 

Proof. We take /*(0) = 1, without loss of generality. As n — > oo, since 
£7 — L — > ^, we have — ► by setting, in the special case where £ = oo, 
Lemma 13. This gives the deterministic limit of the denomi- 
nator in (68). The limit distribution of the numerator is obtained by ap- 
plying Lemma 12 below. Let A n and T n be the square matrices indexed 
by the pairs (j,k),(j,k) € l n x Z n (in lexicographic order) and defined as 
follows: 

(1) A n is the diagonal matrix such that [Ai](jfc) (j k) = (n2 _i ) _1//2 sign(7 — 
(l n )) for all (j,k)E 

(2) T n is the covariance matrix of the vector [\j — (T n )\ 1 / 2 2~ do: 'Wj : k](j.k)ei„- 

Let p(A) denote the spectral radius of the square matrix A, that is, the max- 
imum of the absolute value of its eigenvalues. Of course, p[A n ] = (?i2 _L )~ 1 / 2 . 
Moreover, p[T n ] < Y%=LP\Fn,j]i where T n j is the covariance matrix of the 

vector [\j - {2 n )\ 1/2 2~ doj Wj t k]k=o,...,n j -i- Since {Wj^jkeZ is a stationary 
time series, by Lemma 11, 

p[T n ,j] < \j ~ (l n )\2- 2doj 2ir sup D i)0 (A; u). 

Ae(-7r,7r) 

From (10), since D 00i o(-;^o) is bounded on (— 7r,7r), we get, for a constant 
C not depending on n, p[T n ] < CY^j=L \ j ~ (^n)l- By (61), the latter sum 
is 0((U - L) 2 ). Hence, as n — > oo, since U — L < J n — L = 0(log(n2" i )), 
we have pL4 n ]p[r n ] = 0((n2~ L )~ 1 / 2 '(U - L) 2 ) — > 0, so the conditions of Lemma 12 
are met, provided that (n2 _L ) _1 Var(S n (do)) has a finite limit. 

To conclude the proof, we need to compute this limit. In [14], Proposi- 
tion 2, it is shown that for all u = 0, 1, . . . , as j — > oo and rij — > oo, 

(69) c n (j,u) = 2- M ^n^ u Co V (a 2 ,aj. u 2) ^ A7Tl u (d ), 

where I n (d) is defined in (27) and a] = ± E^i" 1 W 2 k . Since S n (d ) = Yg =L (j - 
(2 n ))2 _2j ' do nja| ) we obtain 



(n2- L )- 1 Var(S n (d )) 
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= + L — (l n )) 2 2^-^^c n (L + i, 0) 

i=0 

70 

U-L i 



2 E Y.^ i + L -^)){i-u + L-{l n )) 



i=l u=l 



By the Cauchy-Schwarz inequality, (45), (8) and rij- u x nj2~ u imply that 
\cn(j,u) \ < C2~ 2d ° u+U / 2 , where C is a positive constant. Using this bound, 
(61) and rij < Til" 3 for bounding the terms of the two series in the right- 
hand side of (70) yields the following convergent series: 2~^i=o(^ + 1) 2 2~* and 
E£iEi=i(* + l)(i - «+ l)2- i+u / 2 . Using the assumptions on U and L, 
we have n^+j ~ n2 _ ^ + ^ for any i > and by Lemma 13, (X n ) — L — > % as 
n — > oo. Hence, by dominated convergence, (70) and (69) finally give that, 
as n— > oo, (n2 ) _1 Var(S n (do)) converges to 



(71) 4vr 



i (d ) Kl (2-2~ e ) + 2 ]T {i-m)d-m-u)2 2d » u - l i u {d a ) 



Ku<i<( 



where in the case i = oo, we have set 2 00 = 0, ri^ = 1 and = 2. Note 
that the above bound on \c n (j,u)\ and (69) imply that as u — > oo, 

(72) I u (d ) = O(2~ 2d0U+u / 2 ), 

which confirms that the series in (71) is convergent for I = oo. Finally, di- 
viding this variance by the squared limit of the denominator in (68), we get 
the limit variance in (68), namely (29) and (30). □ 



The following lemmas were used in the proof of Proposition 10. 

Lemma 11. Let be a stationary process with spectral density 

g and let T n be the covariance matrix of [£i, . . . ,£ n ]. Then, p(T n ) < 27r||g|| 00 . 

Lemma 12. Let {£, n ,n > 1} be a sequence of Gaussian vectors with zero 
mean and covariance T n . Let (A n ) n >i be a sequence of deterministic sym- 
metric matrices such that lim n ^ 00 Yai(^ A n ^ n ) = a 2 £ [0, oo). Assume that 

lim n _ 00 [p(,4 n )p(rn)] = 0. Then, &A n £ n - E[£A n £ n ] -^Af(0, a 2 ). 

Proof. The result is obvious if a = 0, hence we may assume a > 0. 
Let n> 1, k n be the rank of T n and Q n denote an n x k n full-rank ma- 
trix such that QnQn = Tn- Let Q n ~7V(0,Ifc n ), where Ik is the identity 
matrix of size k x k. Then, for any k n x k n unitary matrix U n , U n (, n ~ 
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J\f(0,lk n ) and hence Q n U n ( n has the same distribution as £ n . Moreover, 
since A n is symmetric, so is Q^A n Q n . Choose U n to be a unitary ma- 
trix such that A n = f (Q^A n Q n )U n is a diagonal matrix. Thus, £jA n £ n = 
{QnU n Qn) T A n (Q n U n (, n ) has the same distribution as ^A n ^ n . Since A n is 
diagonal, C^A n £ n is a sum of independent r.v.'s of the form J2k=i ^fc,nC| n> 
where (£i,n> • • • , Cfe n ,n) are independent centered unit-variance Gaussian r.v.'s 
and Afc jn are the diagonal entries of A n . Note that Sfc=i ^fc,n = E[£^.A n £ n ]. 
To check the asymptotic normality, we verify that the Lindeberg condi- 
tions hold for the sum of centered independent r.v.'s: Cn-^nCn — ^[in-^-nin] = 
J2k=i ^k,n(Ck n ~ Under the stated assumptions, 

E A|,„E(C fc 2 n - if = Var(^X£ n ) -> a 2 as n oo 
fe=i 

and p(A n ) = p(Q T n A n Q n ) < p(A n ) sup|| x || =1 ||Q„x|| 2 = p(A n )p(Y n ) -> 0. Since 
p(A n ) = maxi< fc < fcn |A fejn |, for all e > 0, 

E X lnn(Cln ~ l) 2 l(|A fc , n (Ci, n " 1)1 > e)] 
k=l 

< (E A^E[(d 2 in - l) 2 !^)^ - 1| > e)] - as n - oo. 
Hence, the Lindeberg conditions hold provided a > 0. □ 

Lemma 13. Letp,£> 0, % and «^ 6e defined as in (28), (X) as in (19) 
and 

JCr^pr 1 E (j-W) 2 ^- 1 E iO'-W). 

We have 

1 - 2~ e (l + £/2) 

(73) ^= V / ' e(0,l), lim % = l, lim^ = 2, 

1 — 2 V t_ r- L 7 (— »oo l—>oo 

(74) forallu>0 ) im "Eo — w( j ~ + u ~ %) = 1 

£^oo ){/ ?— ■ * 2 — Z 1 



and /or all n>l and < jo < ji < J n , 

31 31 -jo 



(75) 



E(i-io)%-^° e w 



<2(T-l)( J1 -jo) p+1 . 
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Moreover, if < L n < J n with n2 Ln — > oo as n — > oo, then 
_sup ||J n (L n ,j 1 )|-n2- i "(2-2"^- i "))| = 0(log(n2- i ")), 

_sup \(T n (L n , ji)) - L n - Tfo-rJ = 0(log 2 (n2- L ")(n2- L ")- 1 ), 

sup |^[X„(L n ,ji)] - Kjl _ £ J = 0(log 3 (n2- i ")(n2- i ")- 1 ). 

jl=Ln,...,J;n 
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